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Introduction 


Intelligent data analysis (IDA) is one of the progressive methods for analyzing large volumes of data. It is 
the process of discovering and further applying knowledge or previously unknown information from existing 
sets[1], the main objectives of which are classification; association; clustering; forecasting; subsequence. 


Tools for creating intelligent applications are represented by developments from Cognos, G2 from Gensym 
Corp, MineSet from Silicon Graphics, Intelligent Miner from IBM, and IDIS from Information Discovery. 


Universal IDA tools are quite complex and expensive, so they are not always used in integrated end-user- 
oriented systems. Intelligent data analysis is used in many areas of modern society, helping to solve a wide 
variety of problems. These are, for example, insurance, banking, marketing, financial risk analysis, 
monitoring of equipment and technological processes[9-16], telecommunications, computer security[17-29], 
etc. 


In the field of computer security, IDA methods are closely related to the creation of promising information 
security systems (IPS). It is the IDA methodology that helps to implement in information security the 
evolutionary properties of adaptation, self-organization, learning, the possibility of inheritance and 
representation of the experience of information[30-36] security experts in the form of a system of fuzzy rules 
accessible for analysis. 


Intellectual analysis in corporate information systems 


The most complex modern information structures[35-38] aimed at large companies are corporate systems. 
They are characterized by the use of multiple computers, client-server architecture, specialization of servers, 
the presence of a single information space, and an extensive network of data reception and transmission. CIS 
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databases contain huge volumes of data and have all the features of a complex system organization. CIS 
information spaces include relational and object DBMSs, transactional databases, time series and large- 
volume numerical data, multidimensional OLAP storages. 


Examples of commercial corporate systems are R/3 systems from SAP, Oracle Application, Microsoft 
Business Solution Navision, the Parus system, an application solution for the 1C: Enterprise 8.0 system 
"Manufacturing Enterprise Management", corporate information systems from Atlas, corporate information 
projects based on Lotus Notes/Domino technology. In [2], hybrid intelligent systems are considered that 
make it possible to effectively combine formalized and informal knowledge through the integration of 
traditional artificial intelligence tools, examples of the merging of corporate and intelligent systems. 


Let's supplement the earlier review with information about the use of IDA in commercial corporate systems. 


A comprehensive business intelligence software solution that provides quick access to information and its 
use in making strategically important decisions from SAP is the SAP Business Intelligence (SAP BI) 
subsystem. 


The core of the solution is a data warehouse designed to store internal and external information, including 
documentation, video and audio clips. It integrates information across the entire SAP Business Suite 
platform and provides the ability to quickly respond to market changes, monitor indicators of key success 
factors, analyze and optimize enterprise performance based on a single business model. 


Oracle provides a full range of data mining products - from various tools to ready-made applications - and 
tailors their use according to the user's problems. 


The most popular tools are Oracle OLAP and Oracle Data Mining. OLAP (online analytical data processing) 
tools are useful when it comes to multidimensional indicators, their hierarchical aggregation and detailing, 
and modeling. Microsoft SQL Server 2022 provides an integrated environment for creating and working 
with data mining models. This environment is called Microsoft SQL Server Analysis Services and consists 
of a set of special tools (Business Intelligence Development Studio, SQL Server Management Studio, 
Microsoft SQL Server 2022 Integration Services, BI Development Studio). This environment includes data 
mining algorithms and tools that facilitate the development of a comprehensive solution applicable to a wide 
variety of projects. 


The 1C Enterprise 8.0 system has a special tool — “Data Analysis Subsystem”, which can be built into any 
platform configuration. 


It is designed to help users of a corporate information system find answers to non-trivial questions. Provides 
automated transformation of data accumulated in the corporate information system into practically useful 
and well-interpreted patterns, implements grouping of relatively similar objects; search for stable 
combinations of events and objects (associations); provides the construction of a cause-and-effect hierarchy 
of conditions leading to certain decisions (decision tree). 


A feature of many commercial corporate systems is that security systems are not initially included in their 
composition, and, despite the availability of tools, must be selected and purchased separately. 


This leads to additional costs (financial, time, labor, material, etc.) when purchasing, setting up and 
operating systems; it requires the development of approvals for the integration of two dissimilar systems. 


Intelligent data analysis in ensuring information security 


Modern computer systems and networks are in a state of constant development and modification, and the 
volume of analyzed data in the world doubles every year. Therefore, to ensure the required level of 
information protection, it is necessary to respond flexibly and quickly to changing conditions, provide 
reliable protection taking into account constant changes in input influences, and prevent the actions of 
intruders, i.e. have adaptive and self-developing information security systems. 
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The purpose of this work is to develop a methodology for using data mining to build an adaptive, self- 
developing information security system in corporate systems. 


The need to use data mining tools in the information security system of corporate systems stems from the 
heterogeneity of the structures of the information spaces of these systems; difficulties in obtaining analytical 
information from large databases; a large number of users simultaneously working in the system; 
requirements for constant monitoring of functioning and making informed management decisions, 
depending on many factors. 


The prerequisites for using IDA in a CIS are client-server technology, distributed databases, the availability 
of information storage facilities, the use of modern network technologies and a variety of tools used for 
collecting, processing, visualizing and analyzing data. 


A feature of information security systems in corporate systems is a combination of at least three problems: 
information security in computer networks; ensuring database security; ensuring the safe operation of 
automatic information processing systems [3]. 


Intelligent tools often used in computer networks include knowledge bases as part of expert systems, 
systems based on the Bayesian method, fuzzy logic systems, neural networks, evolutionary methods and 
hybrid intelligent systems. The main tasks solved by intelligent means of ensuring information security of a 
computer network are classification and clustering. Intelligent database security tools can be found in [4]. It 
is indicated that the database information security system must use the tools and objects of the applied 
database management system (DBMS), database objects and tools, a set of rules and events characterizing 
user actions. In [5] it is stated that it is the recording of events that allows one to get an idea of what each 
user is interested in; a list of the main recorded events has been compiled. 


The means of ensuring the safe operation of information processing systems include intrusion prevention 
mechanisms, authorization, delimitation of access rights, cryptographic protection (on storage media, in 
networks, password protection), and management of user rights. In order to monitor the state of the system, 
they use signature databases of known attacks, and use system logs and files as the main sources of 
information, and analyze the contents of network traffic and files. 


The traditional approach to building a security system using IDA tools uses artificial neural networks, 
decision trees and classification algorithms, fuzzy clustering methods, association rules, limited search 
algorithms, and cluster analysis. 


Neural networks are used to monitor the traffic of a protected local network, search for hidden patterns in 
primary data arrays, and detect intrusions. To predict the value of the target indicator, sets of input variables, 
mathematical activation functions and weighting coefficients of the input parameters are used. An iterative 
training loop is performed, the neural network modifies the weighting coefficients until the predicted output 
parameter matches the actual value. After training, the neural network becomes a model that is used for 
prediction. 


Classification mechanisms are used at the initial level, for example, to systematize protection methods 
(fuzzy conclusions) according to a vector of fuzzy threats. If the reliability of the classification for known 
threats is less than a certain level, then if there are signs of an attack, the classification is expanded by 
introducing a new gradation into the classification - the problem of threat clustering is solved. Associations 
reveal cause-and-effect relationships and determine probabilities or confidence coefficients, allowing 
appropriate conclusions to be drawn. 


Most publications on the use of intelligent information security systems are devoted to attack detection 
systems based on the model proposed by Denning. The model contains a set of profiles for legitimate users, 
compares current activities with the corresponding profile, updates the profile, and reports any anomalies 
detected. 


The disadvantages of the traditional approach are: 
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1. Knowledge bases are formed by experts, i.e. the principle of including situations in them is subjective. 


2. Knowledge bases must be periodically updated, organized, and systematized, which is a labor-intensive 
and expensive procedure. 


3. With the traditional approach, there is a time delay between the appearance of a new attack and means of 
protection against it (lagged counteraction). 


4. Attacks are constantly modified, improved, “disguised” as standard procedures, which requires constant 
improvement and complication of security measures. 


Taking into account the above, the problem of the evolutionary development of information security systems 
(ISS) is relevant. 


Construction of an adaptive self-developing protection system 


Building an intelligent data analysis model is part of a larger process that involves everything from 
formulating data selection and storage issues to model creation to deploying the model to production. Let's 
move on to describe the features of building an adaptive self-developing system. 


Table 1 provides a list of the main data sources and the information they contain to be analyzed. 


Table 1 — Sources of analyzed data 


Data source 
log files of running subsystems 


Analyzed information 

time and type of operations performed, 
essence of operations, password compliance, 
failures when establishing communication 
with a remote machine, emergency stop 
diagnostics 

loading of network equipment, use of 
communication channels, network activity 


network traffic 


directories and logs of registration of users 
and events 


User ID codes, correctness of passwords, 
actions performed 


lists of functional tasks 


chains of interconnected calls to tasks and 
processes 


access rights information 


compliance with the regulations for accessing 
resources 


information about the operation of the mail 
system 


statistics, volumes and targeting of mailings 
and postal receipts, subject of messages 


text files 


thematic focus 


Excel workbooks 


security, presence/absence of macros 


tables with attributes of executable files 


file types, creation and modification dates, 
authors of changes and their rights, control of 
“immutability”, addresses of reference 
modules, checksums 


Sources of data for analysis are system event logs, temporary files of servers and workstations, log files of 
running subsystems, network traffic, directories and user and event logs, lists of functional tasks and 
information about access rights, information about the operation of the mail system, text files , Excel 
workbooks, emails, tables with attributes of executable files, etc. 


Working hypotheses: 


1. User activity, targeted access to system resources and processes occurring in the system can be recorded 
and an adequate model can be built. 
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2. An event (or sequence of events) corresponding to a generalized attack model is truly an attack, and the 
use of anticipatory or simultaneous counteraction algorithms is justified. 


3. The system can monitor the operation of the system software and, if it detects damage, restore protection 
and automatically resume downloading of lost or damaged files. 


The mechanism for using IDA for adaptive self-developing information security can be divided into a 
number of stages. 


1. Statement of the problem. At this stage, the requirements are analyzed, the problems that will be solved, 
the metrics by which the model will be evaluated are determined, and the tasks for the data mining 
project are defined. 


This stage examines the levels of data confidentiality, user needs and rights in relation to available data, 
identification and authentication methods traditionally used in the enterprise. 


In this case, system information security risks can be defined as a function of three variables: 


> the likelihood of the existence of threats (potentially possible events, intentional or accidental, that could 
have an undesirable impact on the corporate system or its parts, or on information assets and, as a result, 
on the company’s business processes); 


> the likelihood of the existence of vulnerabilities (deficiencies or shortcomings in the system, due to 
which it becomes possible for unwanted influence on it from intruders, unqualified personnel or 
malicious code); 


> potential losses, which are potential direct and indirect financial losses resulting from the implementation 
of threats and the presence of vulnerabilities 


At the same time, the decision to expand the classifications of attacks and protection mechanisms is made in 
accordance with the system of assessing the reliability of neutralizing threats in the context of individual 
protection mechanisms. The feasibility of using a protection mechanism as part of multi-level information 
security systems can be justified, for example, using the matrix of reliability of using protection mechanisms 
to neutralize threats [6]: 


where me;; — elements of the credibility matrix “threats - protection mechanisms”. 


Conclusion 


Intelligent data analysis is a necessary and modern addition to such a large information structure as a 
corporate system. One of its components is the information security system. Security means must be 
constantly improved and developed, which is why the mechanism proposed in the work for constructing an 
adaptive self-developing information security system is relevant, and the use of fast algorithms along with 
IDA will increase the efficiency of the system, which is the topic of a separate study. 
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